Skip to content

Fix race conditions in StateManagerRedis lock detection#6196

Merged
masenf merged 3 commits intomainfrom
masenf/statemanager-redis-race
Mar 19, 2026
Merged

Fix race conditions in StateManagerRedis lock detection#6196
masenf merged 3 commits intomainfrom
masenf/statemanager-redis-race

Conversation

@masenf
Copy link
Copy Markdown
Collaborator

@masenf masenf commented Mar 19, 2026

Fix other race conditions also present in the tests.

In short: both the framework and tests must deterministically wait for the Redis pubsub psubscribe call to complete to ensure that events, such as lock release, are properly recorded.

adhami3310
adhami3310 previously approved these changes Mar 19, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 19, 2026

Merging this PR will not alter performance

✅ 8 untouched benchmarks


Comparing masenf/statemanager-redis-race (797c8a1) with main (0a5e76e)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR fixes race conditions in StateManagerRedis lock detection by introducing a synchronization barrier (_lock_updates_subscribed asyncio.Event) that ensures the Redis pubsub subscription is fully established before any lock-waiting logic can proceed, preventing missed lock-release notifications. The test infrastructure is also hardened with a complementary event_log_on_update event that gives tests a deterministic way to wait for Redis side-effects rather than relying on timing.

Key changes:

  • redis.py: Adds _lock_updates_subscribed event; _subscribe_lock_updates sets it after psubscribe completes and clears it in a finally block on exit; _wait_for_lock awaits the event before registering a lock waiter.
  • mock_redis.py: Extracts _event_log_append_notify helper; adds event_log_on_update to _internals; adds pubsub_ready barrier to real_redis() so the fixture yields only after the monitoring subscription is active.
  • test_redis.py: Adopts event_log_on_update.clear()/wait() pattern for deterministic assertions; promotes SubState1/SubState2 to module level to avoid duplicate state class registrations across test runs.

Confidence Score: 3/5

  • Safe to merge once the unbounded wait risk in _wait_for_lock is addressed or consciously accepted.
  • The core race-condition fix is correct and well-motivated. The one notable concern is that await self._lock_updates_subscribed.wait() has no timeout: if _subscribe_lock_updates exhausts its retry budget (5 failures by default), the event is never set and modify_state hangs indefinitely with no timeout or error propagation. The rest of the changes — mock infrastructure improvements and test determinism — are clean and well-structured.
  • Pay close attention to reflex/istate/manager/redis.py line 983 — the unbounded wait() with no timeout guard.

Important Files Changed

Filename Overview
reflex/istate/manager/redis.py Adds _lock_updates_subscribed asyncio.Event to gate lock-waiting until the pubsub subscription is active. The fix correctly solves the race condition, but the new await self._lock_updates_subscribed.wait() call (line 983) has no timeout, potentially causing an indefinite hang if the subscriber task exhausts its retry limit.
tests/units/mock_redis.py Extracts _event_log_append_notify helper to unify event log appending; adds event_log_on_update event exposed via _internals so tests can deterministically wait for Redis side-effects. Also adds pubsub_ready synchronization to real_redis(). Clean, well-structured changes.
tests/units/istate/manager/test_redis.py Adds event_log_on_update fixture and updates tests to use clear()/wait() pairs for deterministic assertion timing. Moves SubState1/SubState2 from inside a test function to module level to avoid repeated class registration in the Reflex state registry across test runs.

Sequence Diagram

sequenceDiagram
    participant C as Caller (modify_state)
    participant SMR as StateManagerRedis
    participant ET as _lock_task (ensure_task)
    participant PS as Redis pubsub

    C->>SMR: _wait_for_lock(lock_key, lock_id)
    SMR->>SMR: _try_get_lock() → fails (lock held)
    SMR->>ET: _ensure_lock_task()
    Note over ET: Creates task if not running
    ET->>PS: _enable_keyspace_notifications()
    ET->>PS: pubsub.psubscribe(**handlers)
    PS-->>ET: subscription confirmed
    ET->>SMR: _lock_updates_subscribed.set()
    SMR->>SMR: await _lock_updates_subscribed.wait() ✓
    SMR->>SMR: _lock_waiter(lock_key) registered
    SMR->>SMR: _request_lock_release(lock_key, lock_id)
    PS-->>ET: keyspace event: lock del
    ET->>SMR: _handle_lock_release(event)
    SMR->>SMR: lock_released_event.set()
    SMR->>SMR: _try_get_lock() → success
    SMR-->>C: lock acquired
Loading

Last reviewed commit: "Fix race conditions ..."

Comment thread reflex/istate/manager/redis.py Outdated
masenf and others added 2 commits March 19, 2026 14:25
Fix other race conditions also present in the tests.

In short: both the framework and tests must deterministically wait for the
Redis pubsub psubscribe call to complete to ensure that events, such as lock
release, are properly recorded.

Co-authored-by: Farhan Ali Raza <farhanalirazaazeemi@gmail.com>
@masenf masenf force-pushed the masenf/statemanager-redis-race branch from 8d22705 to 4eaf1c3 Compare March 19, 2026 21:26
ignore TimeoutError for this call, because we still want to enter the lock
waiter loop, even if there was some redis problem, just the waiter might have
to wait for the full lock expiration (fallback case to previous behavior).

but we should _NEVER_ deadlock the process waiting for the psubscribe event and
we should _NEVER_ exit _wait_for just because the timeout for the previous
expired.
@masenf masenf merged commit bd93af0 into main Mar 19, 2026
47 checks passed
@masenf masenf deleted the masenf/statemanager-redis-race branch March 19, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants